# Data modeling with YAML, Jinja, and Python

Cube supports authoring dynamic data models using the [Jinja templating
language][jinja] and Python. This allows de-duplicating common patterns in your data models
as well as dynamically generating data models from a remote data source.

Jinja is supported in all YAML data model files.

## YAML

It is recommended to default to YAML syntax because of its simplicity and readability.

### Folded and literal strings

Sometimes you might want to use multi-line strings in YAML-based data models, e.g.,
in parameters such as `sql` or `description`. It is recommended to use [literal][ref-yaml-literal]
(`|`) string style in such cases as it preserves line breaks.

```yaml
cubes:
  - name: orders
    description: |
      This cube represents customer orders.
      It includes measures for total sales and order count.
    sql: |
      -- Fetch only relevant columns
      SELECT id, created_at, total_amount
      FROM staging.orders
```

## Jinja

Please check the [Jinja documentation][jinja-docs] for details on Jinja syntax.

### Previewing YAML

You can preview the data model code after applying Jinja templates in the <Btn>[Data
Model][ref-data-model-editor]</Btn> editor by clicking <Btn>... → Jinja Preview</Btn>
on files that contain Jinja templates in the sidebar.

<ReferenceBox>

Currently, there's no way to preview the data model code in YAML after applying
Jinja templates in Cube Core. Please [track this issue](https://github.com/cube-js/cube/issues/8134).

</ReferenceBox>

You can also view the resulting data model in [Playground][ref-payground] and [Visual
Model][ref-visual-model]. Also, you can introspect the data model using the
[`/v1/meta` REST API endpoint][ref-meta-api].

### Loops

Jinja supports [looping][jinja-docs-for-loop] over lists and dictionaries. In
the following example, we loop over a list of nested properties and generate a
`LEFT JOIN UNNEST` clause for each one: for each one:

```yaml
{%- set nested_properties = [
  "referrer",
  "href",
  "host",
  "pathname",
  "search"
] -%}

cubes:
  - name: analytics
    sql: |
      SELECT
      {%- for prop in nested_properties %}
        {{ prop }}_prop.value AS {{ prop }}
      {%- endfor %}
      FROM public.events
      {%- for prop in nested_properties %}
      LEFT JOIN UNNEST(properties) AS {{ prop }}_prop ON {{ prop }}_prop.key = '{{ prop }}'
      {%- endfor %}
```

Another useful pattern is to loop over a dictionary of values and generate a
measure for each one, as in the following example:

```yaml
{%- set metrics = {
  "mau": 30,
  "wau": 7,
  "day": 1
} %}

cubes:
  - name: orders
    sql_table: public.orders

    measures:
      {%- for name, days in metrics | items %}
      - name: {{ name | safe }}
        type: count_distinct
        sql: user_id
        rolling_window:
          trailing: {{ days }} day
          offset: start
      {% endfor %}
```

### Macros

Cube data models also support Jinja macros, which allow you to define reusable
snippets of code. You can read more about macros in the [Jinja
documentation][jinja-docs-macros].

In the following example, we define a macro called `dimension()` which generates
a dimension definition in Cube. This macro is then invoked multiple times to
generate multiple dimensions:

```yaml
{# Declare the macro before using it, otherwise Jinja will throw an error. #}
{%- macro dimension(column_name, type='string', primary_key=False) -%}
      - name: {{ column_name }}
        sql: {{ column_name }}
        type: {{ type }}
        {% if primary_key -%}
        primary_key: true
        {% endif -%}
{% endmacro -%}

cubes:
  - name: orders
    sql_table: public.orders

    dimensions:
      {{ dimension('id', 'number', primary_key=True) }}
      {{ dimension('status') }}
      {{ dimension('created_at', 'time') }}
      {{ dimension('completed_at', 'time') }}
```

You could also use macros to generate SQL snippets for use in the `sql`
property:

```yaml
{%- macro cents_to_dollars(column_name, precision=2) -%}
  ({{ column_name }} / 100)::NUMERIC(16, {{ precision }})
{%- endmacro -%}

cubes:
  - name: payments
    sql: |
      SELECT
        id AS payment_id,
        {{ cents_to_dollars('amount') }} AS amount_usd
      FROM app_data.payments
```

### Escaping unsafe strings

[Auto-escaping][jinja-docs-autoescaping] of unsafe string values in Jinja
templates is enabled by default. It means that any strings coming from Python
might get wrapped in quotes, potentially breaking YAML syntax.

You can work around that by using the [`safe` Jinja
filter][jinja-docs-filters-safe] with such string values:

```yaml
cubes:
  - name: my_cube
    description: {{ get_unsafe_string() | safe }}
```

Alternatively, you can wrap unsafe strings into instances of the following
class in your Python code, effectively marking them as safe. This is
particularly useful for library code, e.g., similar to the
[`cube_dbt`][ref-cube-dbt] package.

```python
class SafeString(str):
  is_safe: bool

  def __init__(self, v: str):
    self.is_safe = True
```

## Python

### Template context

You can use Python to declare functions that can be invoked and variables that can be
referenced from within a Jinja template. These functions and variables must be defined
in `model/globals.py` file and registered in the `TemplateContext` instance.

<ReferenceBox>

See the [`TemplateContext` reference][ref-cube-template-context] for more details.

</ReferenceBox>

In the following example, we declare a function called `load_data` that supposedly loads
data from a remote API endpoint. We will then use the function to generate a data model:

```python
from cube import TemplateContext
 
template = TemplateContext()

@template.function('load_data')
def load_data():
   client = MyApiClient("example.com")
   return client.load_data()


class MyApiClient:
  def __init__(self, api_url):
    self.api_url = api_url

  # mock API call
  def load_data(self):
    api_response = {
      "cubes": [
        {
          "name": "cube_from_api",
          "measures": [
            { "name": "count", "type": "count" },
            { "name": "total", "type": "sum", "sql": "amount" }
          ],
          "dimensions": []
        },
        {
          "name": "cube_from_api_with_dimensions",
          "measures": [
            { "name": "active_users", "type": "count_distinct", "sql": "user_id" }
          ],
          "dimensions": [
            { "name": "city", "sql": "city_column", "type": "string" }
          ]
        }
      ]
    }
    return api_response
```

Now that we've decorated our function with the `@template.function` decorator, we can
call it from within a Jinja template. In the following example, we'll call the
`load_data()` function and use the result to generate a data model.

```yaml
cubes:
  {# Here we use the decorated function from earlier #}
  {%- for cube in load_data()["cubes"] %}

  - name: {{ cube.name }}

  {%- if cube.measures is not none and cube.measures|length > 0 %}
    measures:
      {%- for measure in cube.measures %}
      - name: {{ measure.name }}
        type: {{ measure.type }}
      {%- if measure.sql %}
        sql: {{ measure.sql }}
      {%- endif %}
      {%- endfor %}
  {%- endif %}

  {%- if cube.dimensions is not none and cube.dimensions|length > 0 %}
    dimensions:
      {%- for dimension in cube.dimensions %}
      - name: {{ dimension.name }}
        type: {{ dimension.type }}
        sql: {{ dimension.sql }}
      {%- endfor %}
  {%- endif %}
  {%- endfor %}
```

### Imports

In the `model/globals.py` file (or the `cube.py` configuration file), you can
import modules from the current directory. In the following example, we import a function
from the `utils` module and use it to populate a variable in the template context:

```python filename="model/utils.py"
def answer_to_main_question() -> str:
  return "42"
```

```python filename="model/globals.py"
from cube import TemplateContext
from utils import answer_to_main_question

template = TemplateContext()

answer = answer_to_main_question()
template.add_variable('answer', answer)
```
### Dependencies

If you need to use dependencies in your dynamic data model (or your `cube.py`
configuration file), you can list them in the `requirements.txt` file in the root
directory of your Cube deployment. They will be automatically installed with `pip` on
the startup.

<InfoBox>

[`cube` package][ref-cube-package] is available out of the box, it doesn't need to be
listed in `requirements.txt`.

</InfoBox>

If you use dbt for data transformation, you might find the [`cube_dbt`
package][ref-cube-dbt-package] useful. It provides a set of utilities that simplify
defining the data model in YAML [based on dbt models][ref-cube-with-dbt].

If you need to use dependencies with native extensions, build a [custom Docker
image][ref-docker-image-extension].


[jinja]: https://jinja.palletsprojects.com/
[jinja-docs]: https://jinja.palletsprojects.com/en/3.1.x/templates/
[jinja-docs-for-loop]: https://jinja.palletsprojects.com/en/3.1.x/templates/#for
[jinja-docs-macros]:
  https://jinja.palletsprojects.com/en/3.1.x/templates/#macros
[jinja-docs-autoescaping]: https://jinja.palletsprojects.com/en/3.1.x/api/#autoescaping
[jinja-docs-filters-safe]: https://jinja.palletsprojects.com/en/3.1.x/templates/#jinja-filters.safe
[ref-cube-dbt]: /product/data-modeling/reference/cube_dbt
[ref-visual-model]: /product/workspace/visual-model
[ref-docker-image-extension]: /product/deployment/core#extend-the-docker-image
[ref-cube-package]: /product/data-modeling/reference/cube-package
[ref-cube-template-context]: /product/data-modeling/reference/cube-package#templatecontext-class
[ref-cube-dbt-package]: /product/data-modeling/reference/cube_dbt
[ref-cube-with-dbt]: /product/data-modeling/recipes/dbt
[ref-data-model-editor]: /product/workspace/data-model
[ref-payground]: /product/workspace/playground
[ref-meta-api]: /product/apis-integrations/rest-api/reference#base_pathv1meta
[ref-yaml-literal]: https://yaml.org/spec/1.2.2/#812-literal-style
[ref-yaml-folded]: https://yaml.org/spec/1.2.2/#813-folded-style